custom ner model
Towards identifying Source credibility on Information Leakage in Digital Gadget Market
Kumaru, Neha, Gupta, Garvit, Mongia, Shreyas, Singh, Shubham, Kumaraguru, Ponnurangam, Buduru, Arun Balaji
The use of Social media to share content is on a constant rise. One of the capsize effect of information sharing on Social media includes the spread of sensitive information on the public domain. With the digital gadget market becoming highly competitive and ever-evolving, the trend of an increasing number of sensitive posts leaking information on devices in social media is observed. Many web-blogs on digital gadget market have mushroomed recently, making the problem of information leak all pervasive. Credible leaks on specifics of an upcoming device can cause a lot of financial damage to the respective organization. Hence, it is crucial to assess the credibility of the platforms that continuously post about a smartphone or digital gadget leaks. In this work, we analyze the headlines of leak web-blog posts and their corresponding official press-release. We first collect 54, 495 leak and press-release headlines for different smartphones. We train our custom NER model to capture the evolving smartphone names with an accuracy of 82.14% on manually annotated results. We further propose a credibility score metric for the web-blog, based on the number of falsified and authentic smartphone leak posts.
Implementing Hearst Patterns with SpaCy
In this article, I will mostly concentrate on the Hearst patterns, implementation and usage for hypernym extraction. However, I will use Named Entity Recognition (NER) and a dataset of patents; so I recommend checking my previous post in this cycle. Why do we care about patterns in the context of NLP? Because they significantly reduce and simplifies work, basically, it is a simple model. Despite being in the era of Transformer Neural Networks, patterns still can be beneficial.
Implementing Hearst Patterns with SpaCy
In this article, I will mostly concentrate on the Hearst patterns, implementation and usage for hypernym extraction. However, I will use Named Entity Recognition (NER) and a dataset of patents; so I recommend checking my previous post in this cycle. Why do we care about patterns in the context of NLP? Because they significantly reduce and simplifies work, basically, it is a simple model. Despite being in the era of Transformer Neural Networks, patterns still can be beneficial.
How to build a custom NER Model?
Named Entity Recognition (NER) is a Natural Language Processing Technique which is used to extract proper entities in a given text content and classify the extracted entites under pre-defined classes. To put in simple words, NER is a technique used to extract entities such as person names, location names, company names, etc from a given text. NER has its own importance when it comes to information retrieval. Naturally after reading a particular text, Humans can recognize some common entities like person name, date and so on. But to do the same with the aid of computers, we have to help the computer learn and do the task for us. To do so, we can avail services of Natural Language Processing (NLP) and Machine Learning (ML).